Then I would like to go to partially observable MDPs.
The last thing we're going to add that makes our life horrible as an agent, but makes us
perform better as an agent.
Because we're adding noisy sensors, essentially.
So in our example, that's relatively simple.
We still take the same environment, same kind of rewards, all of those kind of things.
But instead of always knowing exactly where the walls are.
If I'm here, then my sensor says, there's a wall to the north, there's a wall to the
south.
Now we're adding just one thing.
Maybe.
I think there's a wall here, and I think there's a wall there.
But I might be wrong.
That's the big new improvement.
If you think of this as a maze, just add some fog, or dark glasses, or something like this.
You're not really sure.
What we need is to add essentially a non-deterministic sensor model.
Which is something we know how to do.
So we've already done that.
We've already looked at that.
In our umbrella example, that was exactly what we had.
We had a sensor model where we had the hidden variables, namely what's the weather outside,
is it raining or shining.
And we have the observable outcomes, namely whether the director brings an umbrella or
not.
And the director is unreliable.
Same situation here.
Where we have a state, and that's hidden.
We don't know what the state of the world is.
We have some certain beliefs of that.
But we also have evidence.
The evidence, and that's very important to see, is something we can accurately see.
In the umbrella example, we know, we're assuming that we actually can accurately sense whether
the director brings the umbrella or not.
It's only the director that's unreliable.
If my eyes are unreliable, we can always factor that into the probabilities of the director.
So what we're really looking at is what's the evidence, what's the probability of seeing
a certain evidence given a state.
Or if we know enough priors via Bayes' Laws the other way around.
So what we have in our example is we have a partial or a noisy sensor, which is essentially
the same.
So we had a sensor that counts the number of adjacent walls.
And we're giving it an error of 0.1.
By the way, we had a 0.1 error for the action uncertainty.
That's just a coincidence to make the numbers easy.
We could have had, there's nothing that ties the accuracy of our actions to the accuracy
of our sensors.
So the immediate consequence here is that the agent does not know which state they're
in.
It can't reliably say so.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:20:42 Min
Aufnahmedatum
2021-03-29
Hochgeladen am
2021-03-30 15:17:25
Sprache
en-US
Definition of partial observability and how to filter at the belief state level.